From Grammar to Lexicon: Unsupervised Learning of Lexical Syntax

نویسنده

  • Michael R. Brent
چکیده

Imagine a language that is completely unfamiliar; the only means of studying it are an ordinary grammar book and a very large corpus of text. No dictionary is available. How can easily recognized, surface grammatical facts be used to extract from a corpus as much syntactic information as possible about individual words? This paper describes an approach based on two principles. First, rely on local morpho-syntactic cues to structure rather than trying to parse entire sentences. Second, treat these cues as probabilistic rather than absolute indicators of syntactic structure. Apply inferential statistics to the data collected using the cues, rather than drawing a categorical conclusion from a single occurrence of a cue. The effectiveness of this approach for inferring the syntactic frames of verbs is supported by experiments on an English corpus using a program called Lerner. Lerner starts out with no knowledge of content words--it bootstraps from determiners, auxiliaries, modals, prepositions, pronouns, complementizers, coordinating conjunctions, and punctuation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Subcategorisation Lexicon for German Verbs induced from a Lexicalised PCFG

The paper presents a large-scale computational subcategorisation lexicon for several thousand German verbs. The lexical entries were obtained by unsupervised learning in a statistical grammar framework: a German context-free grammar containing frame-predicting grammar rules and information about lexical heads was trained on 18.7 million words of a large German newspaper corpus. We developed a s...

متن کامل

Invited Talk: Lexicon Embedded Syntax

This paper explores the notion of lexicon embedded syntax: syntactic structures that are preassembled in natural language lexicons. Section 1 proposes a lexicological perspective on (dependency) syntax: first, it deals with the well-known problem of lexicon-grammar dichotomy, then introduces the notion of lexicon embedded syntax and, finally, presents the lexical models this discussion is based...

متن کامل

Unsupervised Lexical Learning With Categorial Grammars

In this paper we report on an unsupervised approach to learning Categorial Grammar (CG) lexicons. The learner is provided with a set of possible lexical CG categories, the forward and backward application rules of CG and unmarked positive only corpora. Using the categories and rules, the sentences from the corpus are probabilistically parsed. The parses and the history of previously parsed sent...

متن کامل

T&F Proofs: Not For Distribution

A central assumption in generative grammar research on the relationship between syntax and the lexicon is that syntax is a projection of the lexicon. The structure of sentences is a refl ection of the lexical properties of the individual lexical items they contain. In the standard view, each lexical item is associated with a lexical entry that contains three kinds of information, as indicated i...

متن کامل

Lexical Attraction Models of Language

This paper presents lexical attraction models of language, in which the only explicitly represented linguistic knowledge is the likelihood of pairwise relations between words. This is in contrast with models that represent linguistic knowledge in terms of a lexicon, which assigns categories to each word, and a grammar, which expresses possible combinations in terms of these categories. The word...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Linguistics

دوره 19  شماره 

صفحات  -

تاریخ انتشار 1993